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Abstract 


In spite of the interest in and appeal of convolution-based approaches for nonsta¬ 
tionary spatial modeling, off-the-shelf software for model fitting does not as of yet exist. 
Convolution-based models are highly flexible yet notoriously difficult to fit, even with rel¬ 
atively small data sets. The general lack of pre-packaged options for model fitting makes 
it difficult to compare new methodology in nonstationary modeling with other existing 
methods, and as a result most new models are simply compared to stationary models. 

Using a convolution-based approach, we present a new nonstationary covariance func¬ 
tion for spatial Gaussian process models that allows for efficient computing in two ways: 
first, by representing the spatially-varying parameters via a discrete mixture or “mix¬ 
ture component” model, and second, by estimating the mixture component parameters 
through a local likelihood approach. In order to make computations for a convolution- 
based nonstationary spatial model readily available, this paper also presents and describes 
the convoSPAT package for R. The nonstationary model is fit to both a synthetic data set 
and a real data application involving annual precipitation to demonstrate the capabilities 
of the package. 

Keywords: spatial statistics, nonstationary modeling, local likelihood estimation, precipita¬ 
tion, R. 


1. Introduction 


The Gaussian process is an extremely popular modeling approach in modern-day spatial and 
environmental statistics, due largely to the fact that the model is completely characterized 
by first- and second-order properties, and the second-order properties are straightforward 
to specify through widely used classes of valid covariance functions. A broad literature on 
covariance function modeling exists, but traditional approaches are mostly based on assump¬ 
tions of isotropy or stationarity, in which the covariance between the spatial process at two 
locations is a function of only the separation distance or separation vector, respectively. This 
modeling assumption is made mostly for convenience, and is rarely a realistic assumption in 
practice. As a result, a wide variety of nonstationary covariance function models for Gaus¬ 
sian process models have been developed (e.g., Sampson and Guttorp 1992; Higdon 1998; 
Damian, Sampson, and Guttorp 2001; Fuentes 2001; Schmidt and O’Hagan 2003; Paciorek 
and Schervish 2006; Calder 2008; Schmidt, Guttorp, and O’Hagan 2011; Reich, Eidsvik, 
Guindani, Nail, and Schmidt 2011; and Vianna Neto, Schmidt, and Guttorp 2014), in which 
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the spatial dependence structure is allowed to vary over the spatial region of interest. How¬ 
ever, while these nonstationary approaches more appropriately model the covariance in the 
spatial process, most are also highly complex and require intricate model-fitting algorithms, 
making it very difficult to replicate their results in a general setting. Therefore, when new 
nonstationary methods are developed, their performance is usually compared to stationary 
models, for which robust software is available. While software exists for several of these 
nonstationary approaches (see below), there are currently no pre-packaged options for fitting 
convolution-based nonstationary models. 

To address this need, we present a simplified version of the nonstationary spatial Gaussian 
process model introduced by Paciorek and Schervish (2006) in which the locally-varying geo¬ 
metric anisotropies are modeled using a “mixture component” approach, similar to the discrete 
mixture kernel convolution approach in Higdon (1998) but allowing the underlying correlation 
structure to be specified by the modeler. The model is extended to allow other properties 
to vary over space as well, such as the process variance, nugget effect, and smoothness. 
An additional degree of efficiency is gained by using local likelihood techniques to estimate 
the spatially-varying features of the spatial process; then, the locally estimated features are 
smoothed over space, similar in nature to the approach of Fuentes (2002). 

This paper also presents and describes the convoSPAT package for R for conducting a full anal¬ 
ysis of point-referenced spatial data using a convolution-based nonstationary spatial Gaussian 
process model. The primary contribution of the package is to provide accessible model-fitting 
tools for spatial data of this type, as software for convolution-based nonstationary modeling 
does not currently exist. Furthermore, the methods used by the package are computationally 
efficient even when the size of the data is relatively large (on the order of n = 1000). The 
package is able to handle both a single realization of the spatial process observed at a finite 
set of locations, as well as independent and identically distributed replicates of the spatial 
process observed at a common set of locations. Finally, the paper demonstrates how the 
package can be used, and provides analyses of both simulated and real data sets. 

As noted, there are several other (albeit non convolution-based) methods for nonstationary 
spatial modeling that do offer software, namely the basis function approach in the fields pack¬ 
age (Nychka, Furrer, and Sain 2014) and the Gaussian Markov random field approach in the 
INLA package (Lindgren, Rue, and Lindstrom 2011; Ingebrigtsen, Lindgren, and Steinsland 
2014; Fuglstad, Lindgren, Simpson, and Rue 2015; Lindgren and Rue 2015), both available 
for R. However, as these methods arise from significantly different modeling approaches, the 
convoSPAT package represents a novel contribution to the set of available software for non¬ 
stationary spatial modeling. Comparison across software for these various packages is beyond 
the scope of this paper and will be reserved for future work. 

The paper is organized as follows. Section 2 introduces a convolution-based approach for 
nonstationary spatial statistical modeling, and Section 3 describes a full model for observed 
data and the mixture component parameterization. Section 4 outlines a computationally 
efficient approach to inference for the model introduced in Section 3; Sections 5, 6, and 7 
outline usage of the convoSPAT package and present two applications. Section 8 concludes 
the paper. 
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Process convolutions or moving average models are popular constructive methods for speci¬ 
fying a nonstationary process model. In general, a spatial stochastic process Y(-) on G C 1Z d 
can be defined by the kernel convolution 


Y( s) = [ K s (u)dW(u), (1) 

Jn d 

where W (•) is a d-dimensional stochastic process and K s (-) is a (possibly parametric) spatially- 
varying kernel function centered at s £ G. Higdon (2002) summarizes the extremely flexible 
class of spatial process models defined by Equation 1: see, for example, Barry and Ver Hoef 
(1996), Ver Hoef, Cressie, and Barry (2004), Ickstadt and Wolpert (1998), Higdon (1998), 
Ver Hoef et al. (2004), and Hoef and Barry (1998). 

The kernel convolution Equation 1 defines a mean-zero nonstationary spatial Gaussian process 
(GP) if W(-) is chosen to be d-dimensional Brownian motion. A benefit of using Equation 1 
is that in this case the kernel functions completely specify the second-order properties of the 
GP through the covariance function 


Cov(y(s),y(s')) = E[V(s)V(s')] = [ K s (u)K s ,(u)du, (2) 

Jg 

where s, s' 6 G. The popularity of this approach is due largely to the fact that it is much 
easier to specify kernel functions than a covariance function directly, since the kernel functions 
only require j^ d K s (u)du < oo and f^ d (u)du < oo, while a covariance function must 
be even and nonnegative definite (Bochner 1959; Adler 1981). A famous result (Thiebaux 
1976; Thiebaux and Pedder 1987) uses a parametric class of Gaussian kernel functions in 
Equation 2 to give a closed-form covariance function; this result was later extended (Paciorek 
2003; Paciorek and Schervish 2006; Stein 2005) to show that 


C NS (s, s'; 6) = (j(s)cr(s') 


I Els' 


, 1/4 


s(s': 


11/4 


£(s)+£(s') 

2 


1/2 


s, S' 


( 3 ) 


is a valid, nonstationary, parametric covariance function on lZ d ,d > 1, when g(-) is chosen to 
be a valid correlation function on lZ d 7 d > 1. Note that Equation 3 no longer requires kernel 
functions to be specified. In Equation 3, 9 is a generic parameter vector, u(-) represents 
a spatially-varying standard deviation, S(-) is a d x d matrix that represents the spatially- 
varying local anisotropy (controlling both the range and direction of dependence), and 


Q{ s,s') 


(s — S 


,/\T 


£(s) + £(s') 


-l 


s — s 


( 4 ) 


is a Mahalanobis distance. Furthermore, choosing g(-) to be the Matern correlation function 
also allows for the introduction of «:(•), a spatially-varying smoothness parameter (Stein 2005; 
in this case, the Matern correlation function in Equation 3 has smoothness [ac(s) + k(s')]/2). 
While using Equation 3 no longer requires the notion of kernel convolution, we refer to £(•) 
as the kernel matrix, since it was originally defined as the covariance matrix of a Gaussian 
kernel function (Thiebaux 1976; Thiebaux and Pedder 1987). The covariance function in 
Equation 3 is extremely flexible, and has been used in various forms throughout the literature, 
e.g., Paciorek and Schervish (2006), Anderes and Stein (2011), Kleiber and Nychka (2012), 
Katzfuss (2013), and Risser and Calder (2015). 
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3. A nonstationary spatial Gaussian process model 

The covariance function in Equation 3 can be used to define a nonstationary spatial Gaussian 
process model using the following framework. Let {Z(s),s e G} be a spatial field defined on 
G C lZ d , where 

Z(s) = x(s) t /3 + Y(s) + e(s). (5) 

In Equation 5, the mean of the spatial field is E[Z( s)] = x(s) T /3, where x(s) is a p-vector 
of covariates for location s and (3 e 1Z P are unknown regression coefficients. Y(-) repre¬ 
sents a spatially-dependent, mean-zero Gaussian process with covariance function C NS from 
Equation 3, while e(-) represents measurement error and, given r 2 (-), is conditionally inde¬ 
pendent A/"(0, r 2 (s)). (Note: A f(a,b) denotes the univariate Gaussian distribution with mean 
a and variance b.) The spatially-referenced random components, e(-) and Y’(-), are assumed 
to be mutually independent. Finally, define 9 to be a vector of all the variance-covariance 
parameters from the Gaussian process Y (■) and error process e(-). 

Now, suppose we have observations which are a partial realization of Z(-), taken at a fixed, 
finite set of n spatial locations {sj}” =1 6 G , giving the random (observed) vector Z = 
(Z(si),..., Z(s n )). The model in Equation 5 implies that Z has a multivariate Gaussian dis¬ 
tribution, conditional on the unobserved latent process Y = (Y(si),..., Y(s n )) and all other 
model parameters: 

Z|Y, /3, 0 ~ 7V n (X/3 + Y, D(0)), (6) 

where the zth row of X is x(sj) and D(0) is a diagonal matrix with (i, i) element r 2 (sj). (Note: 
J\fq (a, B) denotes the q -variate Gaussian distribution with mean vector a and covariance B.) 
Integrating out the process Y from Equation 6, we can obtain the marginal likelihood of the 
observed data Z given all parameters, which is 

Z|/3, 6 ~ A/" n (X/3, D(0) + fi(0)), (7) 

where fi(0) has elements fiij(0) = C NS (s*, sj; 6). For a particular application, the practi¬ 
tioner can specify the underlying correlation structure (through g(-)) as well as determine 
which of (S(-), <t(-), t 2 (-)} (or /«(■), if the Matern is used) should be fixed or allowed to 
vary spatially. However, some care should be taken in choosing which quantities should be 
spatially-varying: for example, Anderes and Stein (2011) note that allowing both X(-) and 
k(-) to vary over space leads to issues with identifiability. 

3.1. Discrete mixture representation 

One way to reduce the computational demands of fitting a Gaussian process-based spatial 
model with parametric covariance function given by Equation 3 is by characterizing the non¬ 
stationary behavior of a spatial process through the discretized basis kernel approach of Hig¬ 
don (1998). Higdon (1998) estimated the Gaussian kernel function for a generic location to 
be a weighted average of “basis” kernel functions, estimated locally over the spatial region of 
interest. However, since the use of Gaussian kernel functions results in undesirable smooth¬ 
ness properties (see, e.g., Paciorek and Schervish 2006), we instead use a related “mixture 
component” approach, in which the parametric quantities for an arbitrary spatial location are 
defined as a mixture of spatially-varying parameter values associated with a fixed set of com¬ 
ponent locations. Specifically, in this new approach, we define mixture component locations 
{bfc : k = 1 ,,K} with corresponding parameters {(£/., er|, r|, Kk) '■ k = 1 ,,K} (which 
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are the kernel matrix, variance, nugget variance, and smoothness, respectively). Then, for 
4> G {S, <t 2 , r 2 , k}, the parameter set for an arbitrary location s G G is calculated as 


K 

<K s )= yzwk(s)(f> k , 

k= 1 


( 8 ) 


where 


Wk( s) oc exp 


s-b^p) 

2X W J 


(9) 


such that w k ( s ) = 1- For example, the kernel matrix for s G G is S(s) = w k( s )^k- 

In Equation 9, X w acts as a tuning parameter, ensuring that the rate of decay in the weighting 
function is appropriate for both the data set and scale of the spatial domain. Using this 
approach, the number of parameters is now linear in K, the number of mixture component 
locations, instead of n, the sample size. Furthermore, this specification still enables the 
modeler to choose which parameters should be spatially-varying: the kernel matrices, the 
process variance, the nugget variance, and the smoothness. 


3.2. Prediction 


Define Z* = (Z(s|),..., Z( sj^J) to be a vector of the process values at all prediction locations 
of interest. The Gaussian process model in Equation 5 implies that 


Z 

z* 


( 3,6 


~ Afn+m 


X/3 

X*/3 


D (0) + n(0) ci zz *(0) 1\ 
Cl z * z (0) B*{6) + Cl*(6) \) ’ 


where Cov(Z*) = D*(0) + t2*(0) and Cov(Z,Z*) = t2zz*($)- By the properties of the 
multivariate Gaussian distribution, 


Z*|Z — Z, ( 3 , 6 ~ A/" m (/iz*|z> ^z*|z)> (10) 

where 

Mz*|z = X*/3 + ^z*z(0)[D(0) + Ci(6)}-\z - X/3), (11) 

and 

S Z *| Z = [D*(0) + Cl*(6)] - Cl z * z (9)[D(6) + fi^)]"^ ZZ *(0). (12) 

Using plug-in estimates /3 and 0 (see Section 4), the predictor for Z* is then J3 Z *^ Z with 
corresponding prediction errors as the square root of the diagonal elements of Xz*| z . 


Out-of-sample evaluation criteria 

Three cross-validation evaluation criteria can be used to assess the fit of the nonstationary 
spatial model given in Equation 5. First, the mean squared prediction error 

m 

MSPE = - XU - nt, (13) 

3 =1 

where z* is the jth held-out observed (or “validation”) value and z* is the corresponding 
predictor (from Equation 11). The MSPE is a point-wise measure of model fit, and smaller 
MSPE indicates better predictions. 
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Second, to assess the prediction error relative to the standard error of each prediction, we use 
the so-called prediction mean squared deviation ratio 


pMSDR 



3 = 1 


-^) 2 


(14) 


where z* and z* are dehned as above and oj is the prediction error corresponding to z* (from 
Equation 12). 

Finally, the continuous rank probability score will be used (a proper scoring rule; see Gneiting 
and Raftery 2007). For the jth prediction, this is defined as 


CRPSj = CRPS(Fj , z*) 


/ OO 

(Fj(x) - 1{® > z*}) 2 dx , 

-OO 


(15) 


where Fj(-) is the cumulative distribution function (CDF) for the predictive distribution of 
z* given the training data and 1{-} is the indicator function. In this case, given that the 
predictive CDF is Gaussian (conditional on parameters; see Equation 10), a computational 
shortcut can be used for calculating Equation 15: when F is Gaussian with mean p and 
variance cr 2 , 


CRPS(F,z*) = a 





where (f> and <3? denote the probability density and cumulative distribution functions, respec¬ 
tively, of a standard Gaussian random variable. The reported metric will be the average over 
all validation locations, CRPS = m~ l YlyLl CRPSj. CRPS measures the fit of the predictive 
density; larger CRPS (i.e., smaller negative values) indicates better model fit. 


4. Computationally efficient inference 

As discussed in Section 1, fast and efficient inference for a nonstationary process convolution 
model has yet to be made readily available for general use. In spite of its popularity, the 
use of Equation 3 always requires some kind of constraints and has suffered from a lack of 
widespread use due to the complexity of the requisite model fitting and limited pre-packaged 
options. Focusing on the spatially-varying local anisotropy matrices S(-), the covariance 
function in Equation 3 requires a kernel matrix at every observation and prediction location 
of interest. Paciorek and Schervish (2006) accomplish this by modeling Fl(-) as itself a (sta¬ 
tionary) stochastic process, assigning Gaussian process priors to the elements of the spectral 
decomposition of FI(-); alternatively, Katzfuss (2013) uses a basis function representation of 
Fl(-). Both of these models are highly parameterized and require intricate Markov chain 
Monte Carlo methods for model fitting. 

The approach we propose provides efficiency in two ways: first, from the model itself, which 
uses a discrete mixture representation (see Section 3.1), and second, by fitting the mixture 
components of the model locally, using the idea of local likelihood estimation (Tibshirani and 
Hastie 1987). 
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4.1. Local likelihood estimation 

Using the discrete mixture representation of Equation 8, a “full likelihood” approach to pa¬ 
rameter estimation could be taken, in either a Bayesian or maximum likelihood framework, 
although the optimization in a maximum likelihood approach could become intractable for ei¬ 
ther moderately large K or large n. However, since the primary goal of this new methodology 
is computational speed, a further degree of efficiency can be gained by using local likelihood 
estimation (LLE; Tibshirani and Hastie 1987). 

Before discussing the local likelihood approach, we outline a restricted maximum likelihood 
(REML; see Patterson and Thompson 1971, Patterson and Thompson 1974, and Kitanidis 
1983) approach for separating estimation of the mean parameters (3 from the covariance 
parameters 9. The full log-likelihood for (3 and 9 in Equation 5 is 

£ f (/3, 0; Z) = ~ log |n + D| - i(Z - X/3) T (f2 + D)“ 1 (Z - X/3), (16) 

where we have abbreviated D = D(0) and Q = f 1(0)’, a standard maximum likelihood 
approach would set out to maximize C F (/3, 0; Z) directly. REML, on the other hand, uses 
a (log) likelihood based on n — p linearly independent linear combinations of the data that 
have an expected value of zero for all possible (3 and 9. Regardless of which set of linearly 
independent combinations is chosen, the “restricted” log-likelihood, which depends only on 9, 
is 

c r (9-, Z) = ~ log in + D| — ^log |x T (n + d)- x x| - Vpz, ( 17 ) 

where 

p = (n + d)- 1 - (n + D)~ 1 x(x T (n + D) _1 x) -1 x T (n + d) _1 . (is) 

The REML estimate of 6 is obtained by maximizing C R {9\ Z), and the estimate of (3 is the 
generalized least squares estimate 

3 = (X T (i7 + D)- 1 X) _1 X T (fi + D)- 1 Z, (19) 

which is obtained by plugging in 9 to calculate Q and D. These parameter estimates can 
then be plugged in to /i z »| z and S z *| z to obtain predictions and prediction standard errors. 

In the LLE approach, instead of maximizing Equation 17 directly we will set out to maximize 
C R {9^ k ] Z N k )i where N k = N k {r) is a neighborhood for each mixture component location b*. 
that depends on the radius r, such that 

N k = {si <E {si,... ,s n } : {||sj - b fc || < r}} 

and 

ZiVfc = {Z(s) :s£ iVfc}. 

Correspondingly, 9j\r k = (Xfc, cr|, r|, k&). The radius r defines the “span” (Tibshirani and 
Hastie 1987) or window size for each mixture component. The restricted log-likelihood for 
neighborhood N k will be based on a stationary version of the spatial model in Equation 5, 
namely 

Z(s) = x(s) T 3 + T(s) + ?(s), (20) 

where Y(-) is a stationary, mean-zero spatial process with covariance function 

C S { s-s') = cr 2 g (j|X -1/2 (s- s')||) , 


(21) 
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the ?(•) are independent and identically distributed as jV( 0 , r 2 ), conditional on r 2 , and again 
y(-) and ?(•) are independent. Again, in a REML framework, only the variance and covariance 
parameters {£&, u 2 , t|, need to be estimated for each k = l,...,iL. No estimates will 
be obtained for the local mean coefficient vector (3, as all of the mean parameters will be 
estimated globally. 

One final note regarding the estimation of the kernel matrices: the kernel matrix for mixture 
component location k will be obtained through estimating the parameters of its spectral 
decomposition, namely Ai, A 2 , and 77 , where 


cos ( 77 ) 

— sin(7/) 

Ai 

0 

cos(t7) 

sin(r 7 ) 

sin(r 7 ) 

cos(? 7 ) 

0 

A 2 

— sin(ri) 

cos(t7) 


(in the case that we have fixed d = 2). Here, Ai and A 2 are eigenvalues and represent squared 
ranges (such that Ai > 0 and A 2 > 0 ) and 77 represents an angle of rotation, constrained to 
lie between 0 and n/2 for identifiability purposes (Katzfuss 2013). 

The full model in Equation 5 can be fit after plugging REML estimates {Xh, r|, ■ k = 

1,... ,K} into the covariance function in Equation 3 using the discrete basis representation 
in Equation 8 to calculate the likelihood for the observed data. Variance quantities that 
are not specified to be spatially-varying can then be estimated again using REML with the 
spatially-varying components considered fixed. For example, if for a particular model only 
S(-) is allowed to vary spatially and the smoothness is fixed, it remains to estimate the overall 
nugget r 2 and variance cr 2 . The restricted Gaussian likelihood for these parameters is then 

£ W 2 ; Z, R) = ~ log |a 2 R + r 2 I n | - ^ log |X T (a 2 R + r 2 I n ) _1 X| - ^Z T PZ, 

where R is the correlation matrix, i.e., the matrix calculated using Equation 3 without the 
<t(-) terms, and P is defined as in Equation 18. Once all of the covariance parameters have 
been estimated, the estimate of /3 can be calculated as in Equation 19. 

Using this model requires both the number and placement of mixture component locations 
{bfc : k = 1 ,,K}, selecting which of the spatial dependence parameters should be fixed or 
allowed to vary spatially, the tuning parameter for the weighting function A^, and the fitting 
radius r. Parameter estimates for this model are likely to be sensitive to the choice of K and 
the placement of mixture component locations. Furthermore, Tibshirani and Hastie (1987) 
discuss the importance of choosing r, which specifies the “span size,” suggesting that the model 
should be fit using a range of r values, and use a global criterion such as the maximized overall 
likelihood, cross-validation, or Akaike’s Information Criterion to choose the final model. This 
strategy could either be implemented on a trial-and-error basis or in an automated scheme. 
Of course, regardless of the number and locations of the mixture component centroids, the 
radius r should be chosen such that a large enough number of data points are used to estimate 
a local stationary model. 

While different in both motivation and nature, the model outlined above is related to the 
local likelihood method described in Anderes and Stein (2011), which ties together locally 
stationary models to estimate a globally nonstationary model. The model in Anderes and 
Stein (2011) involves optimizing a sum of weighted increments of local log-likelihoods, where 
the weights are estimated smoothly using a smoothing kernel. Alternatively, our approach 
estimates spatially-varying parameters locally using only a subset of the data, then fixing 
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the global parameters according to Equation 8. Both of these approaches avoid the lack-of- 
smoothness issues innate to other similar segmentation approaches, such as Fuentes (2001) or 
the ad hoc nonstationary kriging approach in Paciorek and Schervish (2006), which Anderes 
and Stein (2011) call “hard thresholding” local likelihood estimates. Like Anderes and Stein 
(2011), our approach avoids the problem of non-smooth local parameter estimates implicit to 
hard thresholding methods by using the mixture component representation. 

We conclude this overview of our methodology with a note regarding the computational de¬ 
mands of fitting this model. Recall that Gaussian likelihood calculations such as Equation 16 
typically involve inverting and calculating the determinant ofnxn matrices, requiring 0(?i 3 ) 
calculations (in this case, for each iteration of the optimization procedure). The local like¬ 
lihood approach involves inverting a collection of K x n*, matrices, so that the model 
requires more like 0(Kn 3 ) calculations, where n = /\ _1 Ufc n fc is the average local sample 
size. This represents a significant reduction in both the required memory and CPU time, 
so long as n « n. However, note that since the local models are fit independently of each 
other, if parallelization is utilized (see Section 8) then the computational time could be further 
reduced to 0(n 3 ). 


5 . Using the convoSPAT package for R 

The convoSPAT package (version 1.1.1) is available CRAN, and can be installed as usual: 

R> install.packages( "convoSPAT" ) 

R> library( "convoSPAT" ) 

All of the data sets in Sections 6 and 7 are included in the package. The convoSPAT package 
uses functionality from the ellipse (Murdoch and Chow 2013), fields (Nychka et al. 2014), 
geoR (Ribeiro Jr. and Diggle 2015), MASS (Venables and Ripley 2002), plotrix (Lemon 2006), 
sp (Bivand, Pebesma, and Gomez-Rubio 2013; Pebesma and Bivand 2005), and StatMatch 
(D’Orazio 2016) packages for R. 

Two notes should be made before discussing the functionality of the package. First, while 
the methods described in Sections 2, 3, and 4 are valid for spatial coordinates in a general 
d-dimensional Euclidean space, the following implementation only allows for two-dimensional 
coordinates, with d = 2. Second, on a more technical note, the package is implemented using 
the S3-classes of R, and therefore contains package-specific predict and plot functionality. 

5.1. Nonstationary model fitting 

The primary components of the convoSPAT package are the NSconvo_fit and 
predict. NSconvo functions which fit the nonstationary model discussed in Section 3 and 
provide predictions, respectively. The NSconvo_fit function takes the following arguments 
(with defaults as given): 

NSconvo_fit( geodata = NULL, sp.SPDF = NULL, 
coords = geodata$coords, data = geodata$data, 

cov.model = "exponential", mean.model = data ~ 1, me.locations = NULL, 

N.mc = NULL, me.kernels = NULL, fit.radius, lambda.w = NULL, 
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ns.nugget = FALSE, ns.variance = FALSE, local.pars.LB = NULL, 
local.pars.UB = NULL, global.pars.LB = NULL, global.pars.UB = NULL, 
local.ini.pars = NULL, global.ini.pars = NULL ) 

The spatial coordinates and response variable of interest may be input in several different 
ways: first, the function accepts a geodata object from the geoR package (Ribeiro Jr. and 
Diggle 2015), using the geodata input; second, a SpatialPointsDataFrame object from the 
sp package (Bivand et al. 2013; Pebesma and Bivand 2005), using the sp.SPDF input; finally, 
the coordinates and data may be entered directly using the coords and data inputs. As 
a result, the package is able to handle a variety of geographic coordinate reference systems; 
note, however, that distances between points are calculated using a Mahalanobis distance (see 
Equation 4). 

Two other required inputs are the number of mixture component locations (N.mc) and the 
fit.radius (previously denoted r). The user may specify a covariance model from the 
geoR (Ribeiro Jr. and Diggle 2015) options cauchy, matern, circular, cubic, gaussian, 
exponential, spherical, or wave, as well as a mean model through the usual formula no¬ 
tation (a constant mean is the default). For most applications (and as an alternative to 
specifying N.mc), the user will want to specify the mixture component locations directly: the 
default is to create an evenly spaced grid over the spatial domain of interest, which may not be 
appropriate if the spatial domain is non-rectangular. The tuning parameter for the weighting 
function is defined by lambda, w. The default for X w is fixed to be the square of one-half 
of the minimum distance between mixture component locations, or (0.5 min{| |bfc -V||}) 2 
(in order to ensure a default scaling appropriate for the resolution of the mixture component 
grid), but may also be specified by the user. The user may also specify if either the nugget 
variance or process variance is to be spatially-varying by setting either ns. nugget = TRUE 
or ns.variance = TRUE (or both). If the mixture component kernels themselves are pre¬ 
specified (e.g., based on expert opinion), these may also be passed into the function, which 
will greatly reduce computational time. 

Note that if the data and coordinates are not specified as a geodata object, the data argument 
for this function can accommodate replicates. This might be of interest for applications 
similar to the ones in Sampson and Guttorp (1992), in which the replicates represent repeated 
observations over time that have been temporally detrended. In this case, the model will 
assume a constant spatial dependence structure over replicates (time) as well as the same mean 
function over replicates (that is, the locations must be constant across replicates; furthermore, 
the regression coefficients will be constant across replicates). 

The optimization method used within optim for this package is "L-BFGS-B", which allows for 
the specification of upper and lower bounds for each parameter with respect to the optimiza¬ 
tion search. The upper and lower bounds may be passed to the function via local .pars. LB, 
local .pars .UB, global .pars. LB, and global .pars .UB. The local limits require vectors of 
length five, with bounds for the local parameters Ai, A 2 , t 2 , a 2 , and while the global limits 
require vectors of length three, with bounds for the global parameters r 2 , a 2 , and n. Default 
values for these limits are as follows: for both the global and local parameter estimation, the 
lower bounds for Ai, A2, u 2 , t 2 , and k are fixed at le-5; the upper bound for the smoothness 
k will be fixed to 30. The upper bounds for the variance and kernel parameters, on the other 
hand, will be specific to the application: for the nugget variance (r 2 ) and process variance 
(<r 2 ), the upper bound will be Vd^ )LS (where is the error variance estimate from a stan- 
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dard ordinary least squares procedure); the upper bound for Ai and X 2 will be one-fourth of 
the maximum interpoint distance between observation locations in the data set. The bounds 
for rj are fixed at 0 and ir/2. 

Given that many calls to optim are made within NSconvo_f it, the function prints a message 
to notify the user if optim returns any errors. In test runs of the package, the most common 
warning (non-fatal) message encountered is "ABNORMAL_TERMINATION_IN_LINSRCH", which 
seems to have no negative impact on the results of the optimization. 

The final options in the NSconvo_fit function involve local. ini .pars and 
global. ini .pars, which specify the initial values used for the local and global calls 
of optim, respectively. As with the limits, local. ini. pars is a vector of length five, with 
initial values for the local parameters Ai,A 2 ,t 2 , a 2 , and k , while global. ini. pars is a 
vector of length three, with initial values for the global parameters r 2 , a 2 , and n. The default 
for these inputs are as follows: = A 2 ,init = c/10, where c is the maximum interpoint 

distance between observation locations, r 2 nit = 0.1 5q L iS , a init = 0.9 &OLS’ an d K mit = 1- 
When the NSconvo_fit function is called, the status of the model fitting will be printed on 
the screen. As the function fits the locally stationary models for each mixture component 
location, the function will print the mixture component and number of observations that are 
currently being used for estimation. After the local models have been fit for each mixture 
component location, a printed message will notify the user that the variance parameters are 
being estimated globally (if applicable). 

A function which may be helpful before running NSconvo_fit is the mc_N function, which 
returns the number of observations which will be used to fit each local model for a particular 
set of mixture component locations and fit radius. This function may be helpful when selecting 
the fit radius, as the user may want to know how many data points will be used to fit the 
local model for a number of different fit radii. The inputs to the function are 

mc_N( coords, me.locations, fit.radius ) 

where coords are the observation locations for the full data set, me. locations are the mixture 
component locations, and fit. radius is the fitting radius. This function is also automatically 
run inside the NSconvo_fit wrapper, printing a warning message if any of the local sample 
sizes are less than 5. In this case, the user should refine the mixture component grid or expand 
the fit radius. 

After the model fitting has completed, NSconvo_fit returns a NSconvo object which can 
be passed to the summary.NSconvo function to quickly summarize the fitted model. Among 
other elements, a NSconvo object includes: 

me.kernels, which contains the estimated kernel matrices for the mixture component 
locations, 

me. locations, which contains the mixture component locations, 

MLEs. save, which includes a data frame of the locally-estimated stationary model pa¬ 
rameters for each mixture component location, 

kernel. ellipses, which includes the estimated kernel ellipse for each location in 
coords, 
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beta.GLS, the generalized least squares estimate of /3, 
beta.cov, the estimated covariance matrix of /3, 

tausq. est, the estimate of the nugget variance - either a constant (if estimated globally) 
or a vector with the estimated nugget variance for each location in coords, 

sigmasq.est, the estimate of the process variance - either a constant (if estimated 
globally) or a vector with the estimated process variance for each location in coords, 

kappa.MLE. the global estimate of the smoothness (for cauchy or matern), 

Cov.mat and Cov.mat. inv, the estimated covariance matrix for the data and its inverse 
(respectively), and 

Xmat, the design matrix for the mean model. 

The NSconvo object can also be passed to the predict .NSconvo function, which calculates 
predictions Pz*\z an d prediction standard errors £zq z . The predict .NSconvo function takes 
the following arguments: 

predict.NSconvo( object, pred.coords, pred.covariates = NULL, ... ) 

The object is the output of NSconvo_fit, pred. coords is a matrix of the prediction loca¬ 
tions of interest, pred. covariates is a matrix of covariates for the prediction locations (the 
intercept is added automatically), and . . . allows other options to be passed to the default R 
predict methods. Calculating the predictions when the dimension of pred.coords is large 
is computationally expensive, and a progress meter prints while the machine is working. The 
output from predict. NSconvo includes: 

pred.means, which contains the kriging predictor for each prediction location, and 
pred.SDs, which contains the corresponding prediction standard error. 


5.2. Anisotropic model fitting 

For the sake of comparison, the functions Aniso_fit and predict. Aniso are also provided, 
which fit the stationary (anisotropic) model with the covariance function given by Equation 21 
to the full dataset. Note: the following functions have been coded from scratch and represent 
reimplementations of methodology that already exists in other packages, such as geoR (Ribeiro 
Jr. and Diggle 2015). In spite of being reimplementations, the functions are provided for 
convenience, as their syntax matches NSconvo_f it and can be quickly implemented in tandem 
with NSconvo_f it. However, also note that the version provided here has not been optimized 
nearly to the extent of the version in geoR, and therefore Aniso_fit will take significantly 
longer than a corresponding implementation of, say, likfit in geoR. 

The Aniso_fit function takes the following arguments (with defaults as given): 
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Aniso_fit( geodata = NULL, sp.SPDF = NULL, 

coords = geodata$coords, data = geodata$data, 

cov.model = "exponential", mean.model = data ~ 1, local.pars.LB = NULL, 
local.pars.UB = NULL, local.ini.pars = NULL ) 

The inputs to this function are identical to the corresponding inputs to the nonstationary 
model fitting function, and the output is an Aniso object. Among other elements, an Aniso 
object includes: 

MLEs.save, which includes a data frame of the locally-estimated stationary model pa¬ 
rameters for each mixture component location, 

beta.GLS, the generalized least squares estimate of (3, 

beta, cov, the estimated covariance matrix of /3, 

aniso.pars, the global estimate of the anisotropy parameters Ai, A 2 , and 77 , which 
define the anisotropy matrix in Equation 22, 

aniso.mat, which gives the global estimate of S from Equation 22 in matrix form, 
tausq.est, the global estimate of the nugget variance, 
sigmasq.est, the global estimate of the process variance, 

kappa.MLE : the global estimate of the smoothness (for cauchy or matern), and 

Cov. mat and Cov. mat. inv, the estimated covariance matrix for the data and its inverse 
(respectively) 

Once the anisotropic model has been fit and the output stored, the fitted model object can be 
passed to the predict .Aniso function, which (similar to the nonstationary predict function) 
calculates predictions p z »| z an d prediction standard errors X z *| z . The arguments are again 
identical to the nonstationary predict function: 

predict.Aniso( object, pred.coords, pred.covariates = NULL, ... ) 

The object is the output of Aniso_fit, pred.coords is a matrix of the prediction locations 
of interest, and pred. covariates is a matrix of covariates for the prediction locations (the 
intercept is added automatically). Similar to the nonstationary predict function, the output 
from predict. Aniso includes: 

pred.means, which contains the kriging predictor for each prediction location, and 
pred.SDs, which contains the corresponding prediction standard error. 


5.3. Evaluation criteria and plotting functions 

This package includes a function to quickly calculate the evaluation criteria described in 
Section 3.2, as well as functions to visualize various components of the nonstationary model. 
First, the evaluate_CV function calculates the MSPE, from Equation 13, the pMSDR, from 
Equation 14, and the CRPS, from Equation 15. The function inputs are simply 
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evaluate_CV( holdout.data, pred.mean, pred.SDs ) 

where holdout. data is the held-out validation data and pred.mean and pred. SDs are output 
from one of the fit functions. Note that the user must perform the subsetting of the data. The 
output of evaluate_CV is simply the MSPE, pMSDR, and CRPS, averaged over all hold-out 
locations. 

Next, plotting functions are provided to help visualize the output of either the stationary or 
nonstationary model. The first is plot .NSconvo: 

plot.NSconvo( x, plot.ellipses = TRUE, fit.radius = NULL, 
aniso.mat = NULL, true.me = NULL, ref.loc = NULL, 
all.pred.Iocs = NULL, grid = TRUE, true.col = 1, 
aniso.col = 4, ns.col = 2, plot.me.Iocs = TRUE, ... ) 

which plots either the estimated anisotropy ellipses (plot. ellipses = TRUE) or the estimated 
correlation (plot. ellipses = FALSE). The x is a NSconvo object; additional options can be 
added to a plot of the estimated anisotropy ellipses by specifying the fit.radius that was 
used to fit the model, the ellipse for the stationary model aniso.mat estimated in Aniso_f it, 
and the true mixture component ellipses (if known). To plot the estimated correlation, a 
reference location (ref.loc) must be specified, as well as all of the prediction locations of 
interest (all.pred.Iocs) and whether or not the predictions lie on a rectangular grid. The 
other options correspond to color values for the true mixture component ellipses (true . col), 
the anisotropy ellipse (aniso.col), and the estimated mixture component ellipses (ns. col); 
plot.me.Iocs indicates whether or not the mixture component locations should be plotted. 
Note that the plotted ellipse is the 0.5 probability ellipse for a bivariate Gaussian random 
variable with covariance equal to the kernel matrix. 

Similarly, the plot.Aniso function is provided to plot the estimated correlations from the 
stationary model. The function is 

plot.Aniso( x, ref.loc = NULL, all.pred.Iocs = NULL, grid = TRUE, ... ) 

The inputs are identical to the plot.NSconvo function, except that the object must be of 
the Aniso class. 

5.4. Other functions 

Two additional functions are also provided to simulate data from the nonstationary model 
discussed in Section 3, and are used to create the simulated data set in Section 6. First 
is f _mc_kernels, which calculates the true mixture component kernel matrices through a 
generalized linear model for each component of the kernel matrices’ spectral decomposition 
as given in Equation 22. The function, with default settings, is 

f_mc_kernels( y.min = 0, y.max = 5, x.min = 0, x.max = 5, N.mc = 3~2, 
laml.coef = c( -1.3, 0.5, -0.6 ), lam2.coef = c( -1.4, -0.1, 0.2 ), 
logit.eta.coef = c( 0, -0.15, 0.15 ) ) 

The inputs y.mon, y.max, x.min, and x.max define a rectangular spatial domain, N.mc is the 
number of mixture component locations, and laml.coef, lam2.coef, and logit. eta. coef 
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define regression coefficients for the spatially-varying parameters Ai, A 2 , and rj. For example, 
taking laml.coef = (3 X = (/5q 1 , ) T , the first eigenvector for an arbitrary location 

s = (si, S 2 ) is 

Ai(s) = expl/lg 1 + + /? 2 1 S 2 }- 

The other eigenvalue is calculated similarly. The angle of rotation, on the other hand, is 
calculated through the scaled inverse logit transformation 

V(s) = | ' l°git 1 (Pq + P 1 S 1 + P 2 S 2 ), 

where logit. eta. coef = /3, ; = (/3 q , /3^, P^) T ■ The default coefficients are those used to 
generate the simulated data set in Section 6, and were obtained by trial and error. The 
output of this function includes 

me. locations, which contains the mixture component locations, and 

me. kernels, which contains the mixture component kernels for each mixture component 
location. 

The true mixture component kernel matrices generated in f_mc_kernels can be used to 
simulate a data set using the function 

NSconvo_sim( grid = TRUE, y.min = 0, y.max = 5, x.min = 0, x.max = 5, 

N.obs = 20"2, sim.locations = NULL, me.kernels.obj = NULL, 
me.kernels = NULL, me.locations = NULL, tausq = 0.1, sigmasq = 1, 
beta.coefs = 4, kappa = NULL, covariates = rep( 1, N.obs ), 
cov.model = "exponential" ) 

In this function, grid is a logical input specifying if the simulated data should lie on a 
grid (TRUE) or not (FALSE), y.min, y.max, x.min, and x.max define the rectangular spatial 
domain, N.obs specifies the number of observed locations, me .kernels. obj is an object from 
f_mc_kernels, tausq, sigmasq, beta.coefs, and kappa specify the true parameter values, 
covariates specifies the design matrix for the mean function, and cov.model specifies the 
covariance model. The output of this function is 

sim.locations, which contains the simulated data locations, 
me. locations, which contains the mixture component locations, 

me. kernels, which contains the mixture component kernels for each mixture component 
location, 

kernel. ellipses, which contains the kernel matrices for each simulated data location, 
Cov. mat, which contains the true covariance matrix of the simulated data, and 
sim.data, which contains the simulated data. 
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Figure 1: Left: true mixture component ellipses with observation locations (red) and valida¬ 
tion locations (green). Right: simulated data. 

6. Example 1: Simulated data 

As a simple illustration, the nonstationary model will be fit to an artificial data set simulated 
from the model. The data lie on a 25 x 25 grid (so that n = 25 2 = 625), and there are 
K = 9 mixture component locations with corresponding mixture component ellipses as given 
in Figure 1. Only the kernel matrices are allowed to vary spatially, an exponential correlation 
structure is used, and the mean structure contains the main effects of both coordinates. The 
true parameter values are r 2 = 0.1, o 2 = 1, /3o = 4, /?i = —0.5 (^-coordinate coefficient), 
/?2 = 0.5 (y-coordinate coefficient), and X w = 2. A total of rn = 60 of the simulated data 
points are used as a validation sample. Figure 1 also provides the simulated data along with 
the validation locations. 

The simdata object includes sim.locations, the simulated data locations; me.locations, 
the mixture component locations; me.kernels, the true mixture component kernel matrices; 
sim. data, the simulated data; and holdout. index, a vector of the 60 randomly sampled 
validation location indices. Note that there are actually ten independent and identically 
distributed replicates of the data contained in the ten columns of sim.data; in what follows 
the first column of data will be used. 

Figure 1 can be created with 

R> plot( simdata$mc.locations, pch = "+", asp = 1.25, xlab = 

+ ylab = xlim=c( -0.5, 5.5 ), cex = 2, col = "#4575b4", main = " " ) 

R> points( simdata$sim.locations[ -simdata$holdout.index, ], col = "#d73027", 
+ pch = "+" ) 

R> points( simdata$sim.locations[ simdata$holdout.index, J, col = "#5aae61" ) 
R> for( i in l:dim( simdata$mc.locations )[1] ){ 

+ lines( ellipse( simdata$mc.kernels[ , , i J, 

+ centre = simdata$mc.locations[ i, J, level = 0.5 ) ) 
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R> ggplot( data.frame( simdata = simdata$sim.data[ , 1 ], 

+ xcoord = simdata$sim.locations[ , 1 J, 

+ ycoord = simdata$sim.locations [ ,2 ] ), 

+ aes( x = xcoord, y = ycoord, color = simdata ) ) + 

+ coord_fixed( ratio = 1.25 ) + geom_point( size = 2.5 ) + 

+ xlab( "" ) + ylab( "" ) + name = "Simulated \nData") + 

+ scale_color_gradientn( colours = brewer,pal( 11, "RdYIBu" ), 

+ theme( panel.background = element_rect( fill = "white" ), 

+ panel.border = element_rect( colour = "black", fill = NA ), 

+ panel.grid = element_blank(), legend.key.height = unit( 2, "cm" ), 

+ legend.title = element_text( size = 16 ), 

+ legend.text = element_text( size = 15 ) ) 

(Note: this code requires the ggplot2 (Wickham 2009), RColorBrewer (Neuwirth 2014), and 
ellipse (Murdoch and Chow 2013) packages in R.) 

6.1. Selection of fixed components in the model 

In order to fit the nonstationary model, the user must specify three components: (1) the 
number and placement of mixture component locations, (2) the fitting radius r, and (3) the 
tuning parameter \ w . For this simulated data set, the true number and placement of the 
mixture component locations as well as the tuning parameter are known, and these true 
values will be used. However, even for the simulated data, an optimal fit radius is not known. 
When the tuning parameter is unknown the package provides a default value, but this may 
or may not be the optimal choice. 

Given that the nonstationary model can be fit relatively quickly for given values of r, a 
recommended strategy is to fit the model many times for a variety of different values (same 
for the mixture component grid and A^,, when these are unknown) and choose the final values 
based on which combination yields the “best” results. Of course, there is a trade-off between 
choices of the mixture component grid and fit radius: for a particular grid, the radius should 
be chosen such that a reasonable number of data points are used to fit each locally stationary 
model. It is here that the mc_N function may be helpful, because the user can quickly determine 
how many locations fall within the fitting radius of each grid point. For example, using the 
simulated data: 

R> mc_N( coords = simdata$sim.locations [ -simdata$holdout.index, ], 

+ me.locations = simdata$mc.locations, fit.radius = 2 ) 

[1] 164 212 165 209 269 212 153 204 157 

That is, using the true mixture component grid and a fit radius of r = 2, the local models 
will be fit using anywhere from 157 to 269 data points. 

For the simulated data in this application, the “best” fit was chosen based on the quality of 
the anisotropy parameter estimates relative to the true values; in other cases not involving 
simulated data, “best” might be defined in terms of the cross-validation criteria (see, e.g., 
Section 7). The values used for these three components in the final fit (Section 6.2) were the 
true mixture component grid (with K = 9 evenly spaced locations over the domain), a fit 
radius of r = 2.3 units, and the true tuning parameter value of \ w = 2. 
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True value 

Stationary model 

Nonstationary model 

/?o (intercept) 

4 

4.039 

3.905 

f3 1 (x-coordinate) 

-0.5 

-0.695 

-0.678 

/?2 (y-coordinate) 

0.5 

0.741 

0.770 

t 2 (nugget) 

0.1 

0.086 

0.107 

<7 2 (variance) 

1 

1.038 

0.958 

CRPS 

- 

-0.424 

-0.415 

MSPE 

- 

0.546 

0.524 


Table 1: Parameter estimates from the simulated data, comparing the stationary and non¬ 
stationary models. 


6.2. Final model fitting 

Using the final choices of the fixed model components, the nonstationary model can be fit to 
the non-hold-out data (and results summarized) by 

R> NSfit.model <- NSconvo_fit( 

+ coords = simdata$sim.locations [ -simdata$holdout.index, ], 

+ data = simdata$sim.data[ -simdata$holdout.index, 1 J, 

+ cov.model = "exponential", fit.radius = 2.3, lambda.w = 2, 

+ me.locations = simdata$mc.locations, 

+ mean.model = simdata$sim.data[ -simdata$holdout.index, 1 J 
+ ~ simdata$sim.locations [ -simdata$holdout.index, 1 J 

+ + simdata$sim.locations[ -simdata$holdout.index, 2 ] ) 

R> summaryC NSfit.model ) 

Similarly, the anisotropic model can be fit to the non-hold-out data (and results summarized) 
by 

R> anisofit.model <- Aniso_fit( 

+ coords = simdata$sim.locations [ -simdata$holdout.index, J, 

+ data = simdata$sim.data[ -simdata$holdout.index, 1 J, 

+ cov.model = "exponential", 

+ mean.model = simdata$sim.data[ -simdata$holdout.index, 1 J 
+ ~ simdata$sim.locations[ -simdata$holdout.index, 1 J 

+ + simdata$sim.locations[ -simdata$holdout.index, 2 J ) 

R> summaryC anisofit.model ) 

Predictions for the validation locations under each model can be calculated by 
R> pred.NS <- predict( NSfit.model, 

+ pred.coords = simdata$sim.locations [ simdata$holdout.index, ], 

+ pred.covariates = simdata$sim.locations[ simdata$holdout.index, ] ) 

R> pred.S <- predict( anisofit.model, 

+ pred.coords = simdata$sim.locations[ simdata$holdout.index, ], 
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+ pred.covariates = simdata$sim.locations[ simdata$holdout.index, ] ) 


(note: here, predict is an alias for predict. NSconvo and predict. Aniso for objects of 
these classes', similar syntax will be used throughout) after which the evaluation criteria can 
be calculated by 

R> evaluate_CV( holdout.data = simdata$sim.data[ simdata$holdout.index ], 

+ pred.mean = pred.NS$pred.means, pred.SDs = pred.NS$pred.SDs ) 

R> evaluate_CV( holdout.data = simdata$sim.data[ simdata$holdout.index J, 

+ pred.mean = pred.S$pred.means, pred.SDs = pred.S$pred.SDs ) 

A summary of the parameter estimates and cross-validation results (MSPE and CRPS only) 
from each fitted model is provided in Table 1. 

Calculating predictions on a finer resolution can be done as follows: 

R> grid.x <- seq( from = 0, to = 5, by = 0.05 ) 

R> grid.y <- seq( from = 0, to = 5, by = 0.05 ) 

R> grid.locations <- expand.grid( grid.x, grid.y ) 

R> pred.Iocs <- matrix( c( grid.locations[ , 1 J, grid.locations[ ,2 ] ), 

+ ncol = 2, byrow = F ) 

R> pred.NS.all <- predict( NSfit.model, pred.coords = pred.Iocs, 

+ pred.covariates = pred.Iocs ) 

R> pred.S.all <- predict( anisofit.model, pred.coords = pred.Iocs, 

+ pred.covariates = pred.Iocs ) 

Interpolated maps for the nonstationary model can be created with ggplot2 (Wickham 2009): 

R > ggplot( data.frame( preds = pred.NS.all$pred.means, 

+ xcoord = pred.Iocs [,1], ycoord = pred.Iocs [,2] ), 

+ aes( x = xcoord, y = ycoord, color = preds ) ) + 

+ coord_fixed( ratio = 1.25 ) + geom_point( size = 2.5 ) + 

+ scale_color_gradientn( colours = brewer,pal( 11, "RdYIBu" ), 

+ name = limits = c( -1.15, 8.7 ) ) + 

+ theme( panel.background = element_rect( fill = "white" ), 

+ panel.border = element_rect( colour = "black", fill = NA ), 

+ panel.grid = element_blank() ) 

R > ggplot( data.frame( preds = pred.NS.all$pred.SDs, 

+ xcoord = pred.Iocs[,1], ycoord = pred.Iocs [,2] ), 

+ aes( x = xcoord, y = ycoord, color = preds ) ) + 

+ coord_fixed( ratio = 1.25 ) + geom_point( size = 2.5 ) + 

+ scale_color_gradientn( colours = brewer,pal( 11, "RdYIBu" ), 

+ name = limits = c( 0.38, 0.78 ) ) + 

+ theme( panel.background = element_rect( fill = "white" ), 

+ panel.border = element_rect( colour = "black", fill = NA ), 

+ panel.grid = element_blank() ) 
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Figure 2: Predictions and prediction errors from the stationary model (a. and b.) and the 
nonstationary model (c. and d.). 


(similarly for the stationary predictions and standard errors); these plots are provided in Fig¬ 
ure 2. To visualize the locally-estimated anisotropy, the mixture component kernel matrices 
can be plotted along with the stationary anisotropy ellipse using 

R> plot( NSfit.model, fit.radius = 2.3, xlab = "", ylab = 

+ aniso.mat = anisofit.model$aniso.mat, asp = 1.25, 

+ true.me = simdata$mc.kernels, true.col = 1, 

+ aniso.col = 4, ns.col = 2, xlim = c( -1, 6 ), ylim = c( -1, 6 ) ) 

(Note: as with predict, plot is an alias for plot.NSconvo for an object of class NSconvo; 
similar syntax will be used throughout.) This plot is shown in Figure 3; since the true 
anisotropy ellipses are known, these are also plotted. Note that the estimated nonstationary 
ellipses (solid red) compare quite favorably with the true ellipses (black); furthermore, the 
stationary ellipse (dashed blue) appears to be approximately an “average” of the spatially- 
varying ellipses. 














































Mark D. Risser, Catherine A. Calder 


21 



Figure 3: True mixture component ellipses (solid black) with fit radius (dashed gray), non¬ 
stationary ellipses (solid red), and the stationary ellipse (dashed blue). 



Figure 4: Estimated correlations for a reference point, showing the nonstationary (left) and 
stationary (center) models, as well as the true correlation (right). 


As an additional visualization of the estimated nonstationarity, estimated correlation plots 
for a particular reference point can be obtained by 
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Fit radius 


Figure 5: Computational time (in seconds) for fitting the nonstationary model to the simulated 
data (of size n = 565) across three different mixture component grids (of size K = 4, 9,16, 
respectively) and a range of fit radii. The solid and dashed black lines represents the compu¬ 
tational time needed to fit the stationary model using, respectively, the convoSPAT and geoR 
packages. Computational times given are for a Dual Quad Core Xeon 2.66GHz machine with 
32GB RAM. 


R> plot( NSfit.model, plot.ellipses = FALSE, asp = 1.25, 

+ ref.loc = c( 1.5, 3.5 ), all.pred.Iocs = pred.locs, 

+ col = diverge_hsv( 100 ) ) 

R> plot( anisofit.model, asp = 1.25, ref.loc = c( 1.5, 3.5 ), 

+ all.pred.locs = pred.locs, col = diverge_hsv( 100 ) ) 

and are given in Figure 4 for the nonstationary and stationary models, as well as the true 
correlation for comparison. Note that the stationary model estimates an elliptical correla¬ 
tion pattern, while the nonstationary model captures the non-elliptical nature of the true 
correlation pattern. 

Finally, note that the nonstationary model outperforms the stationary model in terms of both 
CRPS and MSPE (see Table 1). 

6.3. Computational considerations 

Selection of the size and placement of the mixture component grid and the fit radius have a 
major impact on the computational time. As a demonstration of this relationship, consider 
the computational times for a variety of choices for the mixture component grid and fit radius, 
summarized in Figure 5. Three mixture component grids were chosen: “coarse” (with K = 4), 
“true” (with K = 9), and “fine” (with K = 16). For each of these grids, the nonstationary 
model (with spatially-varying nugget and variance) was fit using fit radii ranging between 
r = 0.5 and r = 3. As a reference point, the average number of data points used to fit 
the locally stationary models increases linearly from around 20 for r = 0.5 to 350 for r = 3 
(consistent across the different mixture component grids). 

The linear trends on the log scale in Figure 5 are expected, as calculations for Gaussian 
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Longitude 


Figure 6: Annual precipitation data for 1997, in log mm. 


process models are known to increase with the square of the sample size. The (approximately) 
vertical shifts present in moving from coarse to true to fine are also expected, as the finer 
grids represent fitting more local models with (approximately) the same local sample sizes. 
For comparison, the time required to fit the stationary model using Aniso_fit is also shown 
on Figure 5 as a solid black line. However, as mentioned in Section 5.2, if the user is interested 
in fitting a stationary model to a spatial data set, a much better choice is likfit in geoR 
(Ribeiro Jr. and Diggle 2015). The corresponding stationary (anisotropic) model in geoR can 
be fit by 

R> geoR.fit <- likfit( data = simdata$sim.data[ -simdata$holdout.index, 1 ], 

+ coords = simdata$sim.locations[ -simdata$holdout.index, J, 

+ cov.model = "exponential", ini.cov.pars = c( 1, 1 ), trend = "1st", 

+ fix.psiA = FALSE, fix.psiR = FALSE, lik.method = "REML" ) 

The computational time for likfit is also shown on Figure 5 as a dashed black line. 


7. Example 2: Annual precipitation 

The nonstationary model proposed in Sections 3 and 4 is also applied to a moderately large, 
real data set, consisting of the total annual precipitation in the western United States for 1997. 
The data is available online from the National Center for Atmospheric Research (http:// 
www.image.ucar.edu/GSP/Data/US.monthly.met) as part of a larger data set that includes 
measurements for the entire United States. For the purposes of this analysis, a subset of the 
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Label 

Covariance model details 

Stationary 

Anisotropic, constant nugget and variance 

NS1 

SV anisotropy, constant nugget and variance 

NS2 

SV anisotropy and variance, constant nugget 

NS3 

SV anisotropy and nugget, constant variance 

NS4 

SV anisotropy, variance, and nugget 


Table 2: A brief summary of the different models fit to the precipitation data. Note: “SV” 
indicates “spatially-varying.” 


data that includes the western United States was chosen (see Figure 6) because precipitation 
is smoother and more densely observed over the central and eastern United States. This 
subset is included in the package as an RData (named USprecip97) file and consists of 1270 
observations. For illustration purposes, we work with non-projected longitude and latitude, 
but note that due to the size of the study area this may be inappropriate. 

7.1. Spatial model summaries 

A total of five spatial models were fit to this dataset, the details of which are summarized in 
Table 2: the stationary model and four nonstationary models. All models were assigned the 
same mean structure, which included the main effects of longitude and latitude as well as an 
intercept. Additionally, all models were assigned the same underlying correlation structure, 
chosen to be exponential (i.e., the Matern class with fixed k = 0.5). Twenty percent of the 
observations (m = 254) were held out as a validation data set in order to evaluate each model, 
leaving n = 1016 observations as a training data set. The stationary model was fit using the 
Aniso_fit function and the nonstationary models were fit using the NSconvo_fit function; 
the four nonstationary models represent all combinations of the ns. nugget and ns . variance 
options. 

In addition to selecting which of the nugget variance and/or process variance should be 
spatially-varying, recall that using this model requires specifying (1) a mixture component 
grid, (2) the tuning parameter for the weight function, and (3) the fitting radius r. For each 
of the nonstationary models, the model was actually fit multiple times using two different 
mixture component grids (“coarse”, with I\ = 15, and “fine”, with I\ = 22), four different 
values of the tuning parameter (X w = 1.00,2.67,4.33,6.00), and six different fit radii (for the 
coarse grid, r = 2.5, 3.2, 3.9,4.6, 5.3, 6.0; for the fine grid, r = 2.75,3.20,3.65,4.10,4.55,5.00). 

Of the 2 x 4 x 6 = 48 total models fit for each of the four nonstationary models, the best 
model was selected based on maximizing the CRPS criteria (for the test data using the log 
precipitation); the best models are summarized in Table 3. For each model, the coarse grid 
with a fit radius of r = 3.9 performed best, and the largest tuning parameter was preferred 
for all but model NS1 (however, the CRPS for NS1 with X w = 6 was -0.13714, which is only 
slightly smaller than the CRPS for \ w = 4.33, which was -0.13709). Table 3 also provides the 
computational run time for each model. Parameter estimates for each of the best models are 
summarized in Table 4. 

While the gains are modest, all of the nonstationary models outperform the stationary model 
in terms of both MSPE and CRPS; the best model in terms of both MSPE and CRPS is NS1. 
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Model 

Grid size 


r 

CRPS 

MSPE 

Comp, time 

Stationary 

- 

- 

- 

-0.1424 

0.0666 

85.94 min. 

NS1 

15 (coarse grid) 

4.33 

3.9 

-0.1371 

0.0610 

17.02 min. 

NS2 

15 (coarse grid) 

6.0 

3.9 

-0.1387 

0.0619 

16.31 min. 

NS3 

15 (coarse grid) 

6.0 

3.9 

-0.1377 

0.0619 

16.55 min. 

NS4 

15 (coarse grid) 

6.0 

3.9 

-0.1386 

0.0623 

9.49 min. 


Table 3: Model details and evaluation for the best model of each type fit to the precipitation 
data, selected based on maximizing CRPS. The computational time given is for a Dual Quad 
Core Xeon 2.66GHz machine with 32GB RAM. 


Parameter 

S 

NS1 

NS2 

NS3 

NS4 

A) (int.) 

-5.315 

-2.802 

-1.138 

-2.916 

-1.167 

fix (x-coord) 

-0.067 

-0.037 

-0.028 

-0.039 

-0.029 

/?2 (y-coord) 

0.034 

0.055 

0.0038 

0.053 

0.038 

Ai 

2.772 

SV 

SV 

SV 

SV 

V 

6.385 

SV 

SV 

SV 

SV 

V 

0.336 

SV 

SV 

SV 

SV 

T 2 

0.011 

0.005 

0.013 

SV 

SV 

a 2 

0.379 

0.289 

SV 

0.275 

SV 


Table 4: Parameter estimates for the five best spatial models fit to the precipitation data 
set, indicating which parameters are spatially-varying (SV). Recall that with an exponential 
correlation structure, the smoothness n is fixed to be 0.5 for all models. 


The interpolated prediction maps and corresponding standard errors for the stationary model 
and NS1 are given in Figure 7. Note that the nonstationary model estimates the prediction 
errors much more flexibly, with larger errors in the far southwest (southern California, south¬ 
ern Nevada, and western Arizona) and smaller errors in the northern portion of the domain. 
The stationary model prediction errors are much more homogeneous, being highly dependent 
on the proximity of neighboring observations. 

7.2. Visualizations of nonstationarity 

Graphical summaries can be made to visualize the spatially-varying parameter estimates 
not explicitly provided in Table 4. The fully nonstationary model NS4 will be used for the 
following visualizations, as all of its variance/covariance parameters are spatially-varying. 

First, consider the locally-estimated anisotropy ellipses for each of the mixture component 
locations, shown by the solid red ellipses in Figure 8. This plot also contains the correspond¬ 
ing ellipse from the stationary model (dashed blue), which is constant over space, and the 
neighborhood size (dotted gray) defined by the fit radius r. It is quite clear that precipitation 
in the western United States displays nonstationary behavior with respect to the anisotropy 
ellipses, as the local estimates are highly variable across the spatial domain. 

Next, consider the estimates of the spatially-varying nugget variance (r 2 ) and process variance 
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Figure 7: Predictions and prediction standard errors for the stationary model (plots (a) and 
(b)) and the nonstationary model NS1 (plots (c) and (d)). 


(<r 2 ) over the spatial region for NS4, shown in Figure 9. The nugget variance is highly variable 
across the spatial domain, which may reflect variability in the monitoring devices. The process 
variance is largest along the coastal United States, which is intuitive based on its proximity 
to the ocean and highly variable topography. In general, areas with larger process variance 
correspond to smaller nugget variance, although this is not true everywhere (e.g., central 
Montana). 

Finally, estimated correlation plots for three reference points are given in Figure 10. This plot 
nicely illustrates the fact that the nonstationary model allows the spatial dependence structure 
to change over space, while the stationary model estimates a constant correlation structure. 
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Figure 8: Estimated mixture component ellipses for the nonstationary model (red), the 
stationary (anisotropic) model (blue), and the estimation region (dashed black). 
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Figure 9: Plots of the estimated spatially-varying process variance a 2 (right) and nugget 
variance r 2 (left) for model NS4. 
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Figure 10: Correlation plots for three reference points, comparing the stationary model S 
(top) and nonstationary model NS4 (bottom). 


In addition to allowing the orientation of the anisotropy ellipses to change over space (as seen 
in Figure 8), the nonstationary model allows for non-elliptical correlation patterns. 


8. Discussion 

In this paper, we have presented a nonstationary spatial Gaussian process model that is highly 
flexible yet amenable to computationally efficient inference, as shown through its implemen¬ 
tation in the new convoSPAT package for R. In fact, for a moderately large real data set, 
fitting the nonstationary model is significantly faster than fitting the stationary model (at 
least using Aniso_f it; see Table 3). The model also allows for visualization of the estimated 
nonstationarity, for both the spatially-varying variance parameters and correlation structure. 

The tradeoff for the computational tractability of this model is that uncertainty in the locally- 
estimated parameters is not accounted for in global estimation and, more seriously, this uncer¬ 
tainty is not quantified in the parameter estimation. Furthermore, the model is not completely 
pre-packaged, in that the user must specify the mixture component grid, fit radius, and tuning 
parameter, and the resulting parameter estimates and predictions may be sensitive to these 
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choices. Since uncertainty in parameter estimates is not provided, it is difficult to determine 
if a nonstationary model is needed or if a stationary model would be sufficient. However, 
the primary goal of this model is to provide a “quick and dirty” way to fit a nonstationary 
Gaussian process model to spatial data, allowing new nonstationary methods to be compared 
with another model that is also nonstationary, instead of simply a stationary model. And, 
since the model can be fit very quickly, a practitioner can fit the model for many choices of the 
mixture component grid, fit radius, and tuning parameter (similar to the strategy in Section 
7.1) and use the provided evaluation criteria (or other criteria) to choose a final model. 

Finally, while this model provides a fast approach for modeling nonstationary spatial Gaus¬ 
sian processes, note that the same model could gain an additional degree of computational 
efficiency by parallelizing the LLE component of the model. Currently, the local parameters 
are estimated sequentially; however, the optimization for each mixture component location 
does not depend on the optimization for the other mixture component locations. Thus, a more 
efficient algorithm might estimate all of these components simultaneously, greatly reducing 
the overall computation time even further. A future version of convoSPAT will explore this 
possibility. 
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